baySeq: Empirical Bayesian analysis of patterns of differential expression in count data
نویسنده
چکیده
This vignette is intended to give a rapid introduction to the commands used in implementing two methods of evaluating differential expression in Solexa-type, or count data by means of the baySeq R package. For fuller details on the methods being used, consult Hardcastle & Kelly [1]. The major improvement made in this release is the option to include region length in evaluating differential expression between genomic regions (e.g. genes). See Section 6.3.1 for more details. We assume that we have discrete data from a set of sequencing or other highthroughput experiments, arranged in a matrix such that each column describes a sample and each row describes some entity for which counts exist. For example, the rows may correspond to the different sequences observed in a sequencing experiment. The data then consists of the number of times each sequence is observed in each sample. We wish to determine which, if any, rows of the data correspond to some patterns of differential expression across the samples. This problem has been addressed for pairwise differential expression by the edgeR [3] package. However, baySeq takes an alternative approach to analysis that allows more complicated patterns of differential expression than simple pairwise comparison, and thus is able to cope with more complex experimental designs. We also observe that the methods implemented in baySeq perform at least as well, and in some circumstances considerably better than those implemented in edgeR [1]. baySeq uses empirical Bayesian methods to estimate the posterior likelihoods of each of a set of models that define patterns of differential expression for each row. This approach begins by considering a distribution for the row defined by a set of underlying parameters for which some prior distribution exists. By estimating this prior distribution from the data, we are able to assess, for a given model about the relatedness of our underlying parameters for multiple libraries, the posterior likelihood of the model. In forming a set of models upon the data, we consider which patterns are biologically likely to occur in the data. For example, suppose we have count data from some organism in condition A and condition B. Suppose further that we have two biological replicates for each condition, and hence four libraries A1, A2, B1, B2, where A1, A2 and B1, B2 are the replicates. It is reasonable to suppose that at least some of the rows may be unaffected by our experimental conditions A and B, and the count data for each sample in these rows will
منابع مشابه
Generalised empirical Bayesian methods for discovery of differential data in high-throughput biology
Motivation: High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a ‘large P , small n’ setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of ...
متن کاملGeneralized empirical Bayesian methods for discovery of differential data in high-throughput biology
MOTIVATION High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a 'large P, small n' setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of st...
متن کاملPackage 'tcc' Title Tcc: Differential Expression Analysis for Tag Count Data with Robust Normalization Strategies
April 26, 2017 Type Package Title TCC: Differential expression analysis for tag count data with robust normalization strategies Version 1.16.0 Author Jianqiang Sun, Tomoaki Nishiyama, Kentaro Shimizu, and Koji Kadota Maintainer Jianqiang Sun , Tomoaki Nishiyama Description This package provides a series of functions for performing dif...
متن کاملTitle Tcc: Differential Expression Analysis for Tag Count Data with Robust Normalization Strategies
December 22, 2016 Type Package Title TCC: Differential expression analysis for tag count data with robust normalization strategies Version 1.14.0 Author Jianqiang Sun, Tomoaki Nishiyama, Kentaro Shimizu, and Koji Kadota Maintainer Jianqiang Sun , Tomoaki Nishiyama Description This package provides a series of functions for performing ...
متن کاملTCC: Differential expression analysis for tag count data with robust normalization strategies
The R/Bioconductor package, TCC, provides users with a robust and accurate framework to perform differential expression (DE) analysis of tag count data. We recently developed a multi-step normalization method (TbT; Kadota et al., 2012 [3]) for two-group RNA-seq data. The strategy (called DEGES) is to remove data that are potential differentially expressed genes (DEGs) before performing the data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013